AITopics | descendant model

Collaborating Authors

descendant model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Cluster-Learngene: InheritingAdaptiveClustersfor VisionTransformers

Neural Information Processing SystemsFeb-10-2026, 12:15:00 GMT

Recent studies [42,56]have visualized the mean attention distance of ViTs, offering deeper insights into weight redundancy among attention heads across different layers.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
Asia > China (0.04)

Genre: Research Report (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

2e53c02ea028cbf603f4b6b47fef3d97-Paper-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 22:13:51 GMT

attention head, centroid, descendant model, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(4 more...)

Add feedback

Transformer as Linear Expansion of Learngene

Xia, Shiyu, Zhang, Miaosen, Yang, Xu, Chen, Ruiming, Chen, Haokun, Geng, Xin

arXiv.org Artificial IntelligenceDec-20-2023

We propose expanding the shared Transformer module to produce and initialize Transformers of varying depths, enabling adaptation to diverse resource constraints. Drawing an analogy to genetic expansibility, we term such module as learngene. To identify the expansion mechanism, we delve into the relationship between the layer's position and its corresponding weight value, and find that linear function appropriately approximates this relationship. Building on this insight, we present Transformer as Linear Expansion of learnGene (TLEG), a novel approach for flexibly producing and initializing Transformers of diverse depths. Specifically, to learn learngene, we firstly construct an auxiliary Transformer linearly expanded from learngene, after which we train it through employing soft distillation. Subsequently, we can produce and initialize Transformers of varying depths via linearly expanding the well-trained learngene, thereby supporting diverse downstream scenarios. Extensive experiments on ImageNet-1K demonstrate that TLEG achieves comparable or better performance in contrast to many individual models trained from scratch, while reducing around 2x training cost. When transferring to several downstream classification datasets, TLEG surpasses existing initialization methods by a large margin (e.g., +6.87% on iNat 2019 and +7.66% on CIFAR-100). Under the situation where we need to produce models of varying depths adapting for different resource constraints, TLEG achieves comparable results while reducing around 19x parameters stored to initialize these models and around 5x pre-training costs, in contrast to the pre-training and fine-tuning approach. When transferring a fixed set of parameters to initialize different models, TLEG presents better flexibility and competitive performance while reducing around 2.9x parameters stored to initialize, compared to the pre-training approach.

initialization, learngene, pre-fin, (13 more...)

arXiv.org Artificial Intelligence

2312.05614

Country:

North America > United States (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Building Variable-sized Models via Learngene Pool

Shi, Boyu, Xia, Shiyu, Yang, Xu, Chen, Haokun, Kou, Zhiqiang, Geng, Xin

arXiv.org Artificial IntelligenceDec-11-2023

Recently, Stitchable Neural Networks (SN-Net) is proposed to stitch some pre-trained networks for quickly building numerous networks with different complexity and performance trade-offs. In this way, the burdens of designing or training the variable-sized networks, which can be used in application scenarios with diverse resource constraints, are alleviated. However, SN-Net still faces a few challenges. 1) Stitching from multiple independently pre-trained anchors introduces high storage resource consumption. 2) SN-Net faces challenges to build smaller models for low resource constraints. 3). SN-Net uses an unlearned initialization method for stitch layers, limiting the final performance. To overcome these challenges, motivated by the recently proposed Learngene framework, we propose a novel method called Learngene Pool. Briefly, Learngene distills the critical knowledge from a large pre-trained model into a small part (termed as learngene) and then expands this small part into a few variable-sized models. In our proposed method, we distill one pretrained large model into multiple small models whose network blocks are used as learngene instances to construct the learngene pool. Since only one large model is used, we do not need to store more large models as SN-Net and after distilling, smaller learngene instances can be created to build small models to satisfy low resource constraints. We also insert learnable transformation matrices between the instances to stitch them into variable-sized models to improve the performance of these models. Exhaustive experiments have been implemented and the results validate the effectiveness of the proposed Learngene Pool compared with SN-Net.

ancestry model, auxiliary model, learngene pool, (12 more...)

arXiv.org Artificial Intelligence

2312.05743

Country: Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Learngene: Inheriting Condensed Knowledge from the Ancestry Model to Descendant Models

Wang, Qiufeng, Yang, Xu, Lin, Shuxia, Wang, Jing, Geng, Xin

arXiv.org Artificial IntelligenceJun-29-2023

During the continuous evolution of one organism's ancestry, its genes accumulate extensive experiences and knowledge, enabling newborn descendants to rapidly adapt to their specific environments. Motivated by this observation, we propose a novel machine learning paradigm Learngene to enable learning models to incorporate three key characteristics of genes. (i) Accumulating: the knowledge is accumulated during the continuous learning of an ancestry model. (ii) Condensing: the extensive accumulated knowledge is condensed into a much more compact information piece, i.e., learngene. (iii) Inheriting: the condensed learngene is inherited to make it easier for descendant models to adapt to new environments. Since accumulating has been studied in well-established paradigms like large-scale pre-training and lifelong learning, we focus on condensing and inheriting, which induces three key issues and we provide the preliminary solutions to these issues in this paper: (i) Learngene Form: the learngene is set to a few integral layers that can preserve significance. (ii) Learngene Condensing: we identify which layers among the ancestry model have the most similarity as one pseudo descendant model. (iii) Learngene Inheriting: to construct distinct descendant models for the specific downstream tasks, we stack some randomly initialized layers to the learngene layers. Extensive experiments across various settings, including using different network architectures like Vision Transformer (ViT) and Convolutional Neural Networks (CNNs) on different datasets, are carried out to confirm four advantages of Learngene: it makes the descendant models 1) converge more quickly, 2) exhibit less sensitivity to hyperparameters, 3) perform better, and 4) require fewer training samples to converge.

artificial intelligence, descendant model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2305.02279

Country:

Asia > China > Jiangsu Province > Nanjing (0.04)
Oceania > Australia (0.04)
North America > Canada > Quebec > Montreal (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry: Education > Educational Setting > Continuing Education (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback